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How  normal  flow  constrains 
relative  depth  for  an  active 

observer 


Liuqing  Huang  and  Yiannis  Aloimonos 


\Vc  present  a  set  of  constraints  that  relate  the  relative  depth  of 
(stationary  or  moving)  objects  in  the  field  of  view  with  the 
spaiiotemporal  derivatives  of  the  time  varying  image  intensity 
tunction.  The  constraints  are  purposive  in  the  sense  that  iney 
can  be  used  only  for  the  relative  depth  from  motion  problem 
and  not  in  other  problems  related  to  motion  (i.e.  they  lack 
generality  ),  In  addition,  they  show'  that  relative  depth  could  be 
obtained  without  having  to  go  through  the  intermediate  step 
of  fullly  recovering  .^D  motion,  as  is  commonly  considered. 
Our  analysis  indicates  that  exact  computation  of  retinal 
motion  (optic  flow'  or  displacements)  does  not  appear  to  be  a 
necessary  first  step  for  some  problems  related  to  visual 
motion,  contrary  to  conventional  wisdom.  In  addition,  it  is 
demonstrated  that  optic  flow.  W'hose  computation  is  an  ill- 
posed  problem,  is  related  to  the  motion  of  the  scene  only 
under  very  restrictive  assumptions.  This  paper  is  devoted  to 
the  discovery  of  the  mathematical  constraints  relating  normal 
flow  and  relative  depth.  The  development  of  algorithms  using 
ihe.se  constraints  and  the  study  of  stability  issues  of  such 
algorithms,  is  not  discussed  here. 

Keywords;  computer  vision,  constraints,  field  of  view 


The  problem  of  structure  from  motion  has  attracted  a 
lot  of  attention  in  the  past  few  years'  '■*  because  of  the 
general  usefulness  that  a  potential  solution  to  this 
problem  would  have.  Important  navigational  problems 
such  as  detection  of  independently  moving  objects  by  a 
moving  observer,  passive  navigation,  obstacle  detection, 
target  pursuit  and  many  other  problems  related  to 
robotics,  teleconferencing,  etc.  would  be  simple  applica¬ 
tions  of  a  structure  from  motion  module.  The  problem 


Computer  Vision  Laboratory,  Center  for  Automation  Research. 
University  of  Maryland.  College  Park,  MD  20742-3275.  USA 
Paper  received:  H  March  1993:  revised  paper  received:  9  February  1994 


has  been  formulated  as  follows:  Given  a  sequence  of 
images  taken  by  a  monocular  observer  (the  observer 
and, /or  parts  of  the  scene  could  be  moving),  to  recover 
the  shapes  (and  relative  depths)  of  the  objects  in  the 
scene,  as  well  as  the  (relative)  .^D  motions  of  indepen¬ 
dently  moving  bodies. 

The  problem  has  been  formulated  and  usually  treated 
as  an  aspect  of  the  general  task  of  recovering  3D 
information  from  motion''^'*'.  The  majority  of  the 
proposed  solutions  to  date  are  based  on  the  following 
modular  approach. 

1.  First,  one  computes  the  optic  flow  on  the  image 
plane,  i.e.  the  velocity  with  which  every  image  point 
appears  to  be  moving.  (For  clarity,  we  consider  only 
the  differential  case.  In  the  case  of  long  range 
motion  one  computes  discrete  displacements,  but 
the  analysis  remains  essentially  the  same.) 

2.  Then  segmentation  of  the  flow  field  is  performed 
and  dilTerent  moving  objects  are  identified  on  the 
image  plane.  From  the  segmented  optic  flow  one 
then  computes  the  3D  motion  with  which  each 
visible  surface  is  moving  relative  to  the  observer. 
(Assuming  that  an  object  moves  rigidly,  a  mono¬ 
cular  observer  can  only  compute  its  direction  of 
translation  and  its  rotation,  but  not  its  speed). 

3.  Finally,  using  the  values  of  the  optic  flow,  along 
with  the  results  of  the  previous  step,  one  computes 
the  surface  normal  at  each  point,  or  equivalently, 
the  ratio  Z,/Z,  of  the  depths  of  any  two  points  / 
and  / 

The  reason  that  most  approaches  have  followed  the 
above  three-step  approach  is  two-fold.  The  first  is  due 
to  the  formulation  of  the  problem,  which  insists  on 
recovering  a  complete  relative  depth  map  and  accurate 
three-dimensional  motion.  The  second  is  due  to  the  fact 
that  the  constraints  relating  retinal  motion  to  three- 
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dimensional  structure  involve  3D  motion  in  a  nonlinear 
manner  that  does  not  allow  separability.  For  examples 
of  such  approaches,  see  elsewhere'  However,  the 

past  work  in  this  paradigm,  despite  its  mathematical 
elegance,  is  far  from  being  useful  in  real-time  naviga¬ 
tional  systems,  and  such  techniques  have  found  few  or 
no  practical  applications  (possible  exceptions  are  photo- 
grammetry  and  semiautonomous  applications  requiring 
a  human  operator).  Consequently,  this  approach  cannot 
be  used  to  explain  the  ability  of  biological  organisms  to 
handle  visual  motion. 

There  exist  many  reasons  for  the  limitations  of  the 
optic  flow  approach,  related  to  all  three  steps  listed 
above.  To  begin,  the  computation  of  optic  flow  is  an  ill- 
posed  problem,  i.e,  unless  we  impose  additional 
constraints,  we  cannot  estimate  it*®.  Such  constraints, 
however,  impose  a  relationship  on  the  values  of  the  flow 
Held  which  is  translated  into  an  assumption  about  the 
scene  in  view  (for  example,  smooth).  Thus,  even  if  we 
are  capable  of  obtaining  an  algorithm  that  computes 
optic  flow  in  a  robust  manner,  the  algorithm  will  work 
only  for  a  restricted  set  of  scenes.  The  only  available 
constraint  at  every  point  (  v,  v)  of  the  changing  image 
I(x,  y,  /)  for  the  flow  (m,  v)  is  the  constraint 
IxU  -I-  lyV  -t-  /,  =  0"',  where  the  subscripts  denote  partial 
differentiation.  This  means  that  we  can  only  compute 
the  projection  of  the  flow  on  the  gradient  direction 
((A.  A)  ■  '’)  =  -A)>  i  e.  the  so-called  normal  flow. 

More  graphically,  it  means  that  if  a  feature  (for 
example,  an  edge  segment)  in  the  image  moves  to  a 
new  position,  we  don't  know  where  every  point  of  the 
segment  moved  to  (see  Figure  /);  we  only  know  the 
normal  flow,  i.e.  the  projection  of  the  flow  on  the  image 
gradient  at  that  point. 

A  second  reason  has  to  do  with  the  very  essence  of 
optic  flow.  An  optic  flow  field  is  the  vector  field  of 
apparent  velocities  that  are  associated  with  the  variation 
of  brightness  on  the  image  plane.  Clearly,  the  scene  is 
not  involved  in  this  definition.  One  would  hope  that 
optic  flow  is  equivalent  to  the  so-called  motion  field'®, 
which  is  the  (perspective)  projection  on  the  image  plane 
of  the  three-dimensional  velocity  field  associated  with 
each  point  of  the  visible  surfaces  in  the  scene.  However, 
the  optic  flow  field  and  the  motion  field  are  not  equal  in 
general.  Verri  and  Poggio^^  reported  some  general 


a 


FigBC  1  The  aperture  problem.  Point  A  could  have  moved  to  B,  C, 
D,  E.  However,  whatever  the  value  of  the  image  motion  vector  is.  its 
projection  on  the  normal  to  a  is  always  AD  (known) 


results  in  an  attempt  to  quantify  the  diflerence  between 
the  optic  flow  and  motion  fields.  Although  we  don’t  yet 
have  necessary  and  sufficient  conditions  for  the  equality 
of  the  two  fields,  it  is  clear  that  they  are  equal  only 
under  specific  sets  of  restrictive  conditions. 

A  third  reason  is  related  to  the  second  step  of  the 
existing  algorithms  for  structure  from  motion.  These 
algorithms  attempt  to  first  recover  three-dimensional 
motion  before  they  proceed  to  recover  relative  depth, 
and  this  problem  of  3D  motion  appears  to  be  very 
sensitive  in  the  presence  of  small  amounts  of  noise  in  the 
input  (flow  or  displacements\hbox{)}'^ 

Is  it  possible  to  compute  relative  depth  from  motion 
without  using  optic  flow  fields  (which  are  difficult  to 
compute  and  in  general  not  equal  to  the  motion  fields), 
and  without  having  to  go  through  the  intermediate 
stage  of  3D  motion  recovery?  If  it  is.  then  we  have  the 
potential  for  a  more  robust  algorithm.  This  is  the 
question  we  study  in  this  paper.  It  turns  out  that  it  is 
indeed  possible  to  compute  relative  depth  if  we  use  the 
spatiotemporal  derivatives  of  the  image  intensity 
function  and  we  employ  an  active  observer. 


INPUT 

Our  motivation  is  by  now  clear.  We  wish  to  avoid  using 
optic  flow  as  the  input  to  the  computation  of  structure 
from  motion.  On  the  other  hand,  we  must  utilize  some 
description  of  the  image  motion.  As  such  a  description 
we  choose  the  spatial  and  temporal  derivatives 

dl  dl  dl  .  ,  .  ■  c  ■  .r 

— ,  — .  — .  of  the  image  intensity  function  /(.v.  r. /). 
ax  ay  at 

These  quantities  define  the  normal  flow  at  every  point, 
i.e.  the  projection  of  the  optic  flow  on  the  direction  of 
the  gradient  (A,  /, ).  Clearly,  estimating  the  normal  flow 
is  much  easier  than  estimating  the  actual  optic  flow.  But 
how  is  normal  flow  related  to  the  three-dimensional 
motion  field?  Is  the  normal  optic  flow  field  equal  to  the 
normal  motion  field,  and  under  what  conditions?  This 
question  was  addressed  by  Verri  and  Poggio". 

Let  /(.Y.y,  0  denote  the  image  intensity,  and  consider 
the  optic  flow  field  r  =  («,  v)  and  the  motion  field 
?  =  («,  P)  at  a  point  (  v.y)  where  the  local  (normalized) 
intensity  gradient  is  n  =  (A,  +  /;.  The  normal 

motion  field  at  point  (.v.y)  is  by  definition; 


(A' A)  (dx 
//2  +  /2  \dt'  dt) 


V/  (dx  dy\ 

""  "  HV/II  ■  U/ ’ 

Similarly  the  normal-optic  flow^I  is. 
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Thus: 


U/i  M/I 


_^cU 

VI  (it 


From  this  equation  it  follows  that  if  the  change  of 

dl 


intensity  of  an  image  patch  during  its  motion  J  is 

small  enough  (which  is  a  reasonable  assumption)  and 
the  local  intensity  gradient  has  a  high  magnitude,  then 
the  normal  optic  tlow  and  motion  fields  are  approxi¬ 
mately  equal.  Thus,  provided  that  we  measure  normal 
flow  in  regions  of  high  local  intensity  gradients,  the 
normal  flow  measurements  can  safely  be  used  for 
inferring  3D  structure. 


PREVIOUS  WORK  AND  PURPOSIVE  VISION 

The  idea  of  using  the  spatiotemporal  derivatives  of  the 
image  intensity  function  for  the  solution  of  the  structure 
from  motion  problem  is  not  new.  (Working  with  normal 
flow  or  the  derivatives  of  the  image  is  exactly  the  same 
thing.  The  difference  is  that  the  use  of  normal  flow 
provides  geometric  intuition.)  In  Aloimonos  and 
Brown’-  the  case  of  rotational  motion  was  examined. 
In  Horn  and  Weldon’^  and  Negahdaripour’’  the  case  of 
translational  motion  was  examined  in  detail.  Elsewhere, 
the  general  case  was  examined  for  recovering  only  3D 
motion'’*'’^  using  pattern  matching. 

In  this  paper,  we  take  a  purposive  approach”.  We 
would  like  to  compute  relative  depth  from  motion 
without  having  to  go  through  the  estimation  of  3D 
motion  and  without  having  to  compute  optic  flow.  In 
simple  words,  we  want  a  procedure  that  computes 
relative  depth  and  is  designed  only  for  this  problem.  Of 
course,  if  information  about  3D  motion  is  known,  it  can 
be  effectively  utilized  in  our  problem,  but  this  is  of  no 
concern  to  us  here.  When  building  a  system  that  can 
deal  with  visual  motion  problems,  we  can  visualize  it  as 
consisting  of  many  processes  working  in  a  cooperative 
manner  to  solve  various  problems.  For  example,  the 
theory  described  in  this  paper  could  be  used  to  design  a 
process  that  computes  relative  depth  from  image 
measurements,  independently  of  the  process  that 
computes  3D  motion.  However,  after  a  number  of 
computational  steps,  when  results  about  relative  depth 
and  3D  motion  become  available  from  the  two  inde¬ 
pendent  processes,  they  can  be  exchanged  and  the 
constraints  relating  to  them  can  be  effectively  utilized 
so  that  the  results  are  as  consistent  as  possible.  Such  an 
approach  to  building  vision  systems  is  less  modular  than 
the  general  recovery  approach'’. 

This  approach  of  attempting  general  solutions  to 
specific  problems  (purposive  vision),  as  opposed  to 
working  towards  solutions  to  general  problems 
(reconstructiojiist  vision),  is  justified  by  the  potential 
robustness  of  the  proposed  solutions,  and  is  very  much 
needed  for  the  development  of  successful  systems  in  the 
real  world.  Of  course,  normal  flow  contains  much  less 


information  than  optical  flow,  and  we  cannot  expect 
that  we  will  be  able  to  fully  recover  the  relative  depth 
map.  Indeed,  we  show  that  for  the  case  of  moving 
objects,  relative  depth  cannot  be  obtained  everywhere 
(i.e.  at  every  pixel),  but  only  at  points  where  the  local 
intensity  gradient  is  parallel  to  a  given  direction.  But  a 
full  depth  map  is  not  always  required.  We  only  need  the 
values  of  the  depth  that  are  relevant  to  the  task  at  hand. 


PAPER  ORGANIZATION 

We  define  the  relative  depth  from  motion  problem  as 
follows:  ‘Given  an  active  observer  that  can  collect  a 
series  of  images  of  a  scene,  to  recover  the  relative  depths 
of  objects  (or  features)  in  the  scene."  (An  active 
observer^  controls  the  geometric  parameters  of  its 
sensory  apparatus,  thus  introducing  constraints  on  its 
sensory  data.) 

Since  the  input  to  the  perceptual  process  is  the  normal 
flow,  and  the  normal  flow  field  contains,  in  general,  less 
information  than  the  motion  field,  to  solve  the  problem 
we  need  to  transfer  much  of  the  computation  to  the 
activity  of  the  observer’®.  A  geometric  model  of  the 
observer  is  given  in  Figure  2.  Notice  that  the  camera  is 
resting  on  a  platform  (‘neck")  with  six  degrees  of 
freedom  (actually  only  one  of  the  degrees  is  used),  and 
the  camera  can  rotate  around  its  y  and  y  axes 
(saccades).  (However,  in  this  work  the  only  activity 
required  is  acceleration  along  the  optical  axis.) 

The  organization  of  the  paper  reflects  the  increasing 
difficulty  of  the  problem  as  the  motion  of  the  object  in 
view  becomes  more  complex.  The  following  section  is 
devoted  to  the  case  of  stationary  objects.  It  is  shown 
that  if  the  observer  moves  along  its  optical  axis,  relative 
depth  is  easily  obtained  from  the  normal  flow.  Then  we 
study  the  problem  for  the  case  of  an  object  translating 
parallel  to  the  image  plane,  deal  with  the  case  where  the 
object  is  moving  with  a  general  translation,  and  analyse 
the  general  case.  We  assume  that  independently  moving 
objects  can  be  detected  and  localized  on  the  image.  This 


Image  and  Vision  Computing  Voiume  12  Number  7  September  1994 


437 


How  normal  flow  constrains  relative  depth:  L  Huang  and  Y  Aloimonos 


P(X,  Y,  Z) 


Z 


vn,  +  171, 


As  the  camera  is  the  only  moving  object  in  the  scene, 
and  all  the  objects  are  stationary.  I'  is  the  same  for  all 
image  points.  Thus  we  can  use  equation  (8)  to  decide 
which  object  or  feature  is  closer. 


Figure  3  The  camera  moves  towards  the  objects  in  a  scene  with 
velocity  r, 


problem,  which  is  nontrivial  if  the  observer  is  moving,  is 
addressed  elsewhere’"* 


STATIONARY  OBJECTS 

Let  the  camera  move  towards  the  scene  with  velocity  F, 
along  its  optical  axis.  Let  the  image  point  />(.v,  r)  be  the 
projection  of  3D  point  P(X,  Y.  Z).  After  time  dt, 
P'iX,  Y,Z  -  V,  dl),  which  is  the  new  position  of  P. 
projects  to  p(x\y').  Using  the  relations  of  perspective 
projection  assuming  unit  focal  length,  we  have  (TVgM/'e  i): 


OBJECT  TRANSLATING  PARALLEL  TO  THE 
FOCAL  PLANE 

Here  we  study  the  case  where  the  object  is  translating 
parallel  to  the  focal  plane  with  velocity  F,,  F, ,  along  the 
.V  and  y  axes  respectively,  while  the  camera  is  moving 
towards  the  objeet  with  velocity  F,  along  the  r  axis.  The 
velocity  of  the  object  with  respect  to  the  camera  is 
(F,.  F,.  -  F,  ).  Assume  that  point  P(X.  Y.Z)  projects  to 
p(x.y)  at  lime  /.  and  after  time  dt  the  .same  poini 
P(X -y  V^dt.Y  +  Vydt.Z  -  \\dt)  projeets  to  p(x.y'): 
then  we  have  (see  Figure  4): 

”1 


Z-  V,dt 


Z  -  V,dl 


Thus  we  can  obtain  the  motion  velocity  of  image 
point  p(x,y)  as: 

/-.V  F  F, 

Vv  hm  — ; —  =  —  +  -T—  (5) 

dt  Z  Z 

Similarly,  we  have; 

Suppose  the  unit  normal  vector  (i.e.  the  direction  of 
the  image  gradient)  p(x,y)  is  {n^,n,  ).  The  normal  vector 
is  the  projection  of  the  motion  field  on  the  unit  normal 
vector.  Thus  we  have  the  following  relationship  between 
the  motion  velocity  and  the  normal  flow; 

V„  =  Vj,n.x  +  Vytly  (7) 


=  (12) 
Z  -  I'ydt 

Thus  we  can  obtain  the  motion  velocity  of  image  point 
p{x,y)  as; 

v'  —  V  F  F 

V,  lini  —7—  =  +  .V  ^  (13) 

dt  Z  Z 

Similarly,  we  have; 

V,.  liiii  =  ^  +  I (14) 

(ft-o  dt  Z  Z 


According  to  equation  (7)  we  have: 
v„  =  ^(.v«,  +  vn,)  + 


P(X,  Y,  Z) 


F  Ff 


Figure  4  The  object  is  moving  parallel  to  the  focal  plane 
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While  we  cannot  immediately  recover  (V^l K,,  F,/K) 
from  the  images,  the  vector  is  parallel  to  the  direction  of 
motion  of  the  object  on  the  vy-plane  (f",,  f',).  In  the 
Appendix  we  show  how  to  estimate  the  direction  of 
(i  e.  f'v/f'v)  in  the  general  case.  Note  that  in 
natural  scenes  of  objects,  normal  flows  are  available  in 
all  directions.  If  we  select  a  normal  vector  from  the 
image  of  the  object  that  is  perpendicular  to  the  direction 
of  motion,  the  second  term  of  equation  (15)  will  be  zero. 
Thus  for  objects  moving  parallel  to  the  focal  plane,  we 
obtain  the  direction  of  motion  (K,,  f',  )  (see  Appendix). 
Then,  for  normal  Hows  that  are  perpendicular  to  the 
direction  of  motion,  we  have: 


It  is  noteworthy  that  partial  3D  motion  information 
(Tv/f'i)  is  utilized  in  this  case. 


is  perpendicular  to  the  direction  of  the  3D  motion  in  the 
xy  plane. 

From  equation  ( 16)  we  have: 


11 

(18) 

2,2 

(19) 

=  b2i 
^21 

(20) 

cF,  -  F,  ^ 

7  -  *22 

(21) 

and: 

Z12  =  2,1  -  Kd/ 

(22) 

OBJECT  WITH  GENERAL  TRANSLATION 

When  an  object  is  translating  with  velocity  (F,,  F,..  F-) 
with  respect  to  the  camera  while  the  camera  is 
translating  along  the  r  axis  with  velocity  F,.  the 
motion  of  the  object  with  respect  to  the  coordinate 
system  centred  at  the  camera  is  ( F,,  F, ,  F.-  -  F. )  (Figure 
5).  According  to  equation  (16),  if  we  select  normal  flows 
perpendicular  to  the  direction  of  motion,  we  have: 


Z  .VMv  +  v«, 

This  measurement  is  not  useful  yet  because  we  have  an 
object-specific  velocity  F_-. 

To  eliminate  the  unknown  F-,  the  translational 
velocity  of  the  moving  object  along  the  r  axis,  we  will 
use  two  consecutive  frames,  at  times  t\  and  h.  Assume 
that  the  scene  consists  of  a  stationary  and  a  moving 
object;  that  the  stationary  object  at  time  /|  is  at 
P(X\\,Y\\,Z\\),  and  at  time  /i  is  at  P(X\2,Y\2,Zuy, 
and  that  the  moving  object  at  time  /i  is  at 
/’(A'21. 1^21  ^^21).  and  at  time  ti  is  at  /’(A'22,  T22,Z22).  We 
also  assume  that  the  velocity  of  the  camera  at  time  /|  is 
F,  and  at  time  ti  is  cF<  ,  where  c  ^  1  is  a  constant.  If  the 
camera  is  accelerating  much  faster  than  the  object,  we 
can  assume  that  the  velocity  of  the  object  remains  the 
same  across  the  frames.  We  select  a  normal  flow  v„  that 


Figm  5  Moving  robot  hand  approaching  a  stationary  object 


Z22  =  221  -(F,  -  F-)r// 


(23) 


From  the  above  equations,  when  di  is  small,  we  obtain; 


F<  />i|  —  h\2  -f  h\\h\2dt  6)1  —  612 

Z|2  1  —  c  —  6] it//  4*  eb\\dt  1  —  c 

and: 

F,.  621  —  622  b2\b22dt  621  —  622 

Z22  \  —  c  -  b2\dt cb2\dt  1  —  f 

or: 

e(Z,2.  F,.  !-(•)  =  =  bn-  br. 

Z\2 

and: 


Q(Z22,  K,  1  -  f)  = 


Ml  -C) 
2:: 


=  ^21 


-622 


(24) 


(25) 


(26) 


(27) 


where  for  i.j  =  1,2; 

h  = - - 


(28) 


Thus  we  have  obtained  the  relative  depth  function  Q 
for  a  moving  object  and  a  stationary  object.  Velocity  F, 
and  velocity  ratio  c  are  not  known,  but  since  they  are 
parameters  of  the  camera,  they  remain  the  same  for  all 
objects  involved.  We  assume  that  it  is  known  whether 
the  camera  is  moving  forward  or  backward;  thus  we 
know  the  sign  of  F,..  We  also  assume  that  it  is  known 
whether  the  camera  is  accelerating  or  decelerating;  thus 
we  know  the  sign  of  1  -  c.  Therefore,  we  can  determine 
the  relative  depth  of  the  two  objects  from  equation  (28). 
(It  is  worth  noting  that  the  same  results  can  be  achieved 
if  the  camera  is  at  first  stationary  and  then  moves 
quickly  to  a  new  position  instead  of  moving  and  then 
accelerating.  In  this  case,  F^  — >  0,  c  — *  00  and 
c  •  Fc  -*  F|..  Thus  the  relative  measures  become 

Qiz,K)=-Y^- 
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OBJECT  MOVING  IN  AN  UNRESTRICTED 
RIGID  MANNER 

The  motion  of  a  rigid  object  can  be  described  as  the  sum 
of  a  rotation  plus  a  translation.  We  can  choose  a  point 
through  w>iich  the  rotation  axis  passes;  this  gives  a 
unique  rotation  and  translation  describing  the  rigid 
motion  (in  general,  there  are  infinitely  many  combina¬ 
tions  of  rotations  and  translations  describing  the  same 
rigid  motion).  Assume  that  the  object  is  translating  with 
velocity  7’=  (Tv,  T,,  and  rotating  with  angular 
velocity  R  =  /?,,  R:V  around  a  point 

P  =  (A'o,  }'o.  2o)  on  its  surface  (Figure  6).  The  transla¬ 
tional  components  are  measured  with  respect  to  the 
world  coordinate  system,  while  the  angular  velocity  is 
measured  with  respect  to  the  coordinate  system  whose 
origin  is  located  at  point  (A'o,  Yq,Z„Y.  The  camera  is 
moving  with  velocity  T,  along  the  r  axis. 

Point  P  is  visible  in  the  image;  its  image  is  point 
p  =  (.vo,v'o).  We  attach  a  coordinate  system  to  the 
object,  at  point  P,  with  axes  parallel  to  the  axes  of  the 
observer  coordinate  system.  We  express  the  motion  of 
the  object  in  this  object-based  coordinate  system.  The 
camera  is  moving  with  velocity  T,  along  the  Z-axis. 
Then  the  velocity  of  any  point  Q  on  the  object  is; 


Tv 

T,. 

T.  -  r. 


T- A'o 

+  R^  T-  To 
Z-Zo 


Tv +/?,(Z-Zo)-«:(T-  To)  ■ 

=  T,  +  RAX  -  A'o)  -  RAZ  -  Zo) 

_  r  -  T  -f-  RA  T  -  To)  -  RAX  -  A'o) 

Thus,  expressing  the  optic  flow  (Vy,  v,.)  on  the  image,  we 
have: 


Figure  6  Object  moving  in  an  unrestricted  rigid  manner 


T,  -  r 


Tv  ,  y, 
- h  n. - 

_  7.  T  -  T 


-  RAy  -  .io)(  v«v  -I-  r/i. ) 

•+  Ry(x  -  .voK.vn,  -I-  y;;, ) 

+  RA('<  -  .vo)«,  -  (.V  -  yo)/t.) 

+  -  «v  y  ) 

Considering  this  measurement  u„  at  point  v  =  .vo. 
y  =  yo  (and  Z  =  Zo)  we  have: 

Tv  T,  \ 

u„  =  — - —  hh  j  _  j.  +  1 


Z  \ 

T  -  T 
z 


(.\7iv  •+  y/1, ) 


Provided  that  the  direction  («v,  «>  )  of  the  normal  flow'  at 
(.Vo,yo)  is  perpendicular  to  the  direction  of  parallel 
translation  (Tv.  T).  we  get: 


z 

T  -  r 
z 


~  x(y  -  yo)Rx  + 
Ry  -iy-  yo)R: 


<Z-Zo 


-  Yo)  j 


T  T  —T 
=  -J.J^yS - 1 

z  ^  z 
/z  -  Zo 


Ry->r(x-  Xo)Ri 


’-yo)jRx 


+  y(x  ~  xo) 


where  (xo.yo)  is  the  projection  of  (A'o,  To,  Zo)^. 
Combining  the  above  equations  with  equation  (7),  we 
obtain  for  the  normal  flow: 


Z  .VflWv  +  yon, 

Then,  assuming  two  frames  as  before,  we  obtain: 
K  (1  -  c) 

Q(Z22.  Kx  1  -  O  =  -  />:: 


where  for  ij  =1,2,  h^j  = 


Xiinxii  -1-  yi/M,  0 


We  thus  see  that  we  can  compute  at  least  the  quantity 
K(  1  -  c) 

■"  ^ - ,  where  Z  is  the  depth  at  a  point  p  =  (.v.y)  and 

where  the  direction  of  the  normal  flow  (n^.n,)  is 
perpendicular  to  the  direction  (T,.  T,  )  of  parallel 
translation,  when  the  motion  of  the  object  is  measured 
with  regard  to  a  coordinate  system  with  origin  at  the 
object  point  whose  image  is  point  p  and  axes  parallel  to 
those  of  the  camera  coordinate  system.  Using  the 
technique  described  in  the  Appendix,  we  can  find  the 
direction  of  parallel  motion  (Tv,  T,)  for  any  position  of 
the  object  coordinate  system  and  choose  that  position 
for  which  the  direction  of  the  normal  flow  is 
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Figure  10  Normal  How  ol  a  moving  rohoi  arm  with  a  sialionarv 
camera 


Figure  12  Normal  How  ol  a  moving  rohoi  arm  and  a  moving  camera 


Figure  1 1  Image  taken  after  both  the  robot  arm  and  the  camera  have 
moved  to  new  positions 


(Z/K.d  -  ())  was  10.230856  for  the  arm  and  10.145772 
for  the  toy,  which  agrees  again  with  the  ground  truth. 

These  experiments  demonstrate  that  the  constraints 
introduced  have  the  potential  of  giving  rise  to  algo¬ 
rithms  that  can  be  used  for  the  robust  estimation  of 
relative  depth.  Naturally,  several  stability  issues  need  to 


be  examined.  It  is  well  known  that  pariieular  imvtions  of 
a  visual  sensor  are  quite  pathological  regarding  the 
recovery  of  structure,  while  others  are  more  stable.  Such 
geometric  facts  need  to  be  taken  into  account  when  we 
design  active  vision  techniques  and  provide  the  sensor 
with  an  activity  In  this  particular  case,  the  I'orward 
motion  of  the  sensor  might  not  oe  optimal,  in  the  sense 
that  it  might  not  minimize  errors  in  the  estimation  ol' 
relative  depth. 


SUMMARY  AND  CONCLUSIONS 

We  have  presented  a  set  of  constraints  relating  relative 
depth  and  normal  flow.  i.e.  the  projection  of  the  optic 
flow  on  the  direction  of  the  local  intensity  gradient, 
which  we  showed  to  be  equal  to  the  normal  motion  field 
in  areas  where  the  magnitude  of  the  intensity  gradient  is 
large.  The  heart  of  the  constraints  lies  in  factoring  out 
the  effects  of  the  parallel  translation  on  the  normal 
flow,  by  making  measurements  only  at  places  where  the 
normal  flow  is  perpendicular  to  the  parallel  translation. 
Clearly,  if  nature  conspired  against  this  computational 
theory,  it  could  present  it  with  stimuli  having  only  one 
or  a  few  orientations,  thus  making  it  impossible  to  find 
normal  flows  perpendicular  to  the  direction  of  parallel 
translation.  However,  for  most  objects  in  natural 
environments  one  can  find  gradients  in  almost  any 
direction,  and  we  should  note  that  most  moving  objects 
have  outlines  which  provide  a  (usually  large)  number  of 
gradient  directions.  It  is  important  to  realize,  however, 
that  the  procedures  described  here  will  never  output  an 
incorrect  result.  However,  they  may  not  be  able  to 


442 


Image  and  Vision  Computing  Volume  12  Number  7  September  1994 


How  normal  flow  constrains  relative  depth:  L  Huang  and  Y  Aloimonos 


produce  a  result  at  all.  in  which  case  some  other  process 
should  be  used. 

For  the  case  of  general  translation  we  showed  that 
relative  depth  can  be  computed  at  all  points  where  the 
intensity  gradient  is  perpendicular  to  the  direction  of  the 
parallel  translation.  For  the  case  of  general  motion  we 
considered  a  coordinate  system  attached  to  any  visible 
object  point.  The  consequence  of  this  is  that  at  the 
image  of  that  point  the  effect  of  the  rotation  on  the 
normal  flow  is  zero,  and  the  solution  proceeds  as  before, 
through  the  employment  of  a  specific  activity 
(acceleration  along  the  optical  axis).  Clearly,  many 
such  points  could  be  found. 

APPENDIX 

Here  we  describe  a  technique  for  finding  the  direction  of 
parallel  translation  (l\,  F,  )  ffom  image  measurements. 
We  treat  the  problem  in  the  general  case  (translation 
plus  rotation).  This  appendix  is  a  short  summary  of  a 
technique  described  elsewhere*''.  In  addition,  we  assume 
that  the  observer  is  'looking'  at  the  moving  object,  i.e. 
the  object  lies  on  the  observer's  optical  axis.  If  this  is  not 
the  case,  the  observer  can  always  achieve  it  with  a 
rotation  of  the  camera  (saccade).  (It  is  important  to 
realize,  however,  that  such  a  saccade  does  not  actually 
have  to  be  implemented  -  it  can  be  simulated,  since  the 
effects  of  a  rotation  are  independent  of  depth.  It  is.  of 
course,  assumed  here  that  the  detection  of  the  moving 
object  has  been  accomplished'"' 

Such  a  rotation  introduces  a  known  contribution  to 
the  normal  flow.  So.  we  assume  that  the  moving  object 
lies  on  the  optical  axis  {Figure  At).  To  describe  the 
motion  of  the  object,  we  consider  a  coordinate  system 
attached  to  it  at  its  point  of  intersection  with  the  optical 
axis.  As  a  result,  near  the  image  origin  the  effect  of 
rotation  is  negligible.  Thus,  considering  a  small  area 
around  the  origin,  we  expect  to  find  normal  flows  due  to 
translation  only.  If  we  consider  for  simplicity  a  closed 
contour  in  that  area  (in  an  actual  implementation  one 
would  have  to  consider  all  points  inside  the  contour), 
then  there  are  two  possibilities  for  the  pattern  of  normal 
flow  (assuming  that  the  object  is  moving  closer).  (  If  the 


Figai*  Al  In  aciua)ity.  not  all  lines  will  pass  through  the  same  point. 
In  such  a  case,  angle  AOB  gives  all  possible  directions.  Stability  can  be 
achieved  if  the  analysis  is  done  in  the  dual  space,  where  each  line 
corresponds  to  a  point  and  a  pencil  of  lines  corresponds  to  a  set  of 
collinear  points 


object  is  moving  away,  the  situation  is  symmetric.)  It 
wtll  be  either  as  in  Figure  A2  or  otherwise  (as.  for 
example,  in  Figures  A3  and  A4).  If  the  pattern  is  as 


in  Figure  .42.  then  the  FOE  jr  •  lies  inside  the 

contour  and  thus  it  is  very  small  (negligible).  Indeed,  the 
FOE  lies  on  the  other  side  of  the  normal  flow  [Figure 
.4.^)'-'. 

We  need  the  direction  of  the  vector  (I  ,.  T, ).  In  fact, 
in  our  equations  we  only  had  vectors  of  the  form 


y  which  have  the  same  direction  as  (r>.  1’, ) 

(sec  ’^igure  ,46).  But  since  j*r^  nas  very  small 

magnitude,  the  effect  is  the  same.  i.e.  the  quantity 

jr/i>  +  jr«,  becomes  negligible. 

If  the  pattern  is  not  as  in  Figure  A2.  there  exists  a 
dominant  direction  of  the  flow  on  the  image  plane. 
Assuming  that  the  values  of  the  flow  are  equal  inside  the 
small  patch,  we  can  compute  the  value  of  the  flow  from 
the  normal  flow'  values  [Figures  A7  and  AH).  The 
direction  of  the  flow  at  the  origin  is  equal  to  the 
direction  of  parallel  translation.  Indeed,  if  [u.  r)  is  the 

V  V 

flow  at  the  origin,  we  have  u  —  v  =  where  Z  is 


the  depth  of  the  object  point  projecting  to  the  origin. 


Thus  -  =  i!-, 
U  I  V 
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Figure  A8 
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